130 research outputs found
Methods for Optimization and Regularization of Generative Models
This thesis studies the problem of regularizing and optimizing generative models, often using insights and techniques from kernel methods. The work proceeds in three main themes. Conditional score estimation. We propose a method for estimating conditional densities based on a rich class of RKHS exponential family models. The algorithm works by solving a convex quadratic problem for fitting the gradient of the log density, the score, thus avoiding the need for estimating the normalizing constant. We show the resulting estimator to be consistent and provide convergence rates when the model is well-specified. Structuring and regularizing implicit generative models. In a first contribution, we introduce a method for learning Generative Adversarial Networks, a class of Implicit Generative Models, using a parametric family of Maximum Mean Discrepancies (MMD). We show that controlling the gradient of the critic function defining the MMD is vital for having a sensible loss function. Moreover, we devise a method to enforce exact, analytical gradient constraints. As a second contribution, we introduce and study a new generative model suited for data with low intrinsic dimension embedded in a high dimensional space. This model combines two components: an implicit model, which can learn the low-dimensional support of data, and an energy function, to refine the probability mass by importance sampling on the support of the implicit model. We further introduce algorithms for learning such a hybrid model and for efficient sampling. Optimizing implicit generative models. We first study the Wasserstein gradient flow of the Maximum Mean Discrepancy in a non-parametric setting and provide smoothness conditions on the trajectory of the flow to ensure global convergence. We identify cases when this condition does not hold and propose a new algorithm based on noise injection to mitigate this problem. In a second contribution, we consider the Wasserstein gradient flow of generic loss functionals in a parametric setting. This flow is invariant to the model's parameterization, just like the Fisher gradient flows in information geometry. It has the additional benefit to be well defined even for models with varying supports, which is particularly well suited for implicit generative models. We then introduce a general framework for approximating the Wasserstein natural gradient by leveraging a dual formulation of the Wasserstein pseudo-Riemannian metric that we restrict to a Reproducing Kernel Hilbert Space. The resulting estimator is scalable and provably consistent as it relies on Nystrom methods
Truth Bounties: A Market Solution to Fake News
False information poses a threat to individuals, groups, and society. Many people struggle to judge the veracity of the information around them, whether that information travels through newspapers, talk radio, TV, or social media. Concerned with the spread of misinformation and harmful falsehoods, much of the policy, popular, and scholarly conversation today revolves around proposals to expand the regulation of individuals, platforms, and the media. While more regulation may seem inevitable, it faces constitutional and political hurdles. Furthermore, regulation can have undesirable side effects and be ripe for abuse by powerful actors, public and private.
This Article presents an alternative for fighting misinformation that avoids many pitfalls of regulation: truth bounties. We develop a contractual mechanism that would enable individuals, media, and others to pledge money to support the credibility of their communications. Any person could claim the bounty by presenting evidence of the falsity of the communication before a dedicated body of private arbitrators. Under the system we envision, anyone consuming information on the internet would know immediately ifa given communication had a bounty attached, whether the communication had been challenged, and whether the challenge succeeded orfailed. As John Stuart Mill recognized, we can trust our grasp of the truth only by putting it to the fire of challenge. Truth bounties open the challenge to all
Generalized Energy Based Models
We introduce the Generalized Energy Based Model (GEBM) for generative
modelling. These models combine two trained components: a base distribution
(generally an implicit model), which can learn the support of data with low
intrinsic dimension in a high dimensional space; and an energy function, to
refine the probability mass on the learned support. Both the energy function
and base jointly constitute the final model, unlike GANs, which retain only the
base distribution (the "generator"). GEBMs are trained by alternating between
learning the energy and the base. We show that both training stages are
well-defined: the energy is learned by maximising a generalized likelihood, and
the resulting energy-based loss provides informative gradients for learning the
base. Samples from the posterior on the latent space of the trained model can
be obtained via MCMC, thus finding regions in this space that produce better
quality samples. Empirically, the GEBM samples on image-generation tasks are of
much better quality than those from the learned generator alone, indicating
that all else being equal, the GEBM will outperform a GAN of the same
complexity. When using normalizing flows as base measures, GEBMs succeed on
density modelling tasks, returning comparable performance to direct maximum
likelihood of the same networks
Rethinking Gauss-Newton for learning over-parameterized models
This work studies the global convergence and generalization properties of
Gauss Newton's (GN) when optimizing one-hidden layer networks in the
over-parameterized regime. We first establish a global convergence result for
GN in the continuous-time limit exhibiting a faster convergence rate compared
to GD due to improved conditioning. We then perform an empirical study on a
synthetic regression task to investigate the implicit bias of GN's method. We
find that, while GN is consistently faster than GD in finding a global optimum,
the performance of the learned model on a test dataset is heavily influenced by
both the learning rate and the variance of the randomly initialized network's
weights. Specifically, we find that initializing with a smaller variance
results in a better generalization, a behavior also observed for GD. However,
in contrast to GD where larger learning rates lead to the best generalization,
we find that GN achieves an improved generalization when using smaller learning
rates, albeit at the cost of slower convergence. This study emphasizes the
significance of the learning rate in balancing the optimization speed of GN
with the generalization ability of the learned solution
Efficient Wasserstein Natural Gradients for Reinforcement Learning
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baseline
Annealed Flow Transport Monte Carlo
Annealed Importance Sampling (AIS) and its Sequential Monte Carlo (SMC)
extensions are state-of-the-art methods for estimating normalizing constants of
probability distributions. We propose here a novel Monte Carlo algorithm,
Annealed Flow Transport (AFT), that builds upon AIS and SMC and combines them
with normalizing flows (NFs) for improved performance. This method transports a
set of particles using not only importance sampling (IS), Markov chain Monte
Carlo (MCMC) and resampling steps - as in SMC, but also relies on NFs which are
learned sequentially to push particles towards the successive annealed targets.
We provide limit theorems for the resulting Monte Carlo estimates of the
normalizing constant and expectations with respect to the target distribution.
Additionally, we show that a continuous-time scaling limit of the population
version of AFT is given by a Feynman--Kac measure which simplifies to the law
of a controlled diffusion for expressive NFs. We demonstrate experimentally the
benefits and limitations of our methodology on a variety of applications
- …